Here we have 4 cluster groups. Group 0, which represent as green have lower salary, mostly under 150k, and max years experience in 2-5 years, it is likely Likely junior to mid-level employees with moderate pay. Group 1 with orange, has medium to high salary, wide range from $100k–$500k and with narrow range ~3 years, they are suggests specialized or high-paying roles with short experience — possibly fast-track promotions or high-demand fields. cluster 2 are low salary and experience from 0-4 years, they are clearly entry level employee. cluster 3 has medium salary, mostly under 200k with higher experiences, like 6-13 eyars. They probably are senior professionals with more experience but not the highest salaries.
Code
import pandas as pdfrom sklearn.linear_model import LinearRegressionfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import mean_squared_error, r2_scoreimport plotly.graph_objects as go# Prepare features & targetfeatures = eda[['MIN_YEARS_EXPERIENCE', 'MAX_YEARS_EXPERIENCE']].apply(pd.to_numeric, errors='coerce')features = features.dropna()X = featuresy = eda.loc[X.index, 'SALARY']# Train/test splitX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=688)# Fit model & predictmodel = LinearRegression()model.fit(X_train, y_train)y_pred = model.predict(X_test)# Metrics (optional, but handy)mse = mean_squared_error(y_test, y_pred)r2 = r2_score(y_test, y_pred)print(f"MSE: {mse:.2f}, R²: {r2:.3f}")# Define min/max for the identity linemin_val = y_test.min()max_val = y_test.max()
MSE: 758547543.20, R²: 0.096
Code
import numpy as npimport plotly.graph_objects as go# Apply log transformy_test_log = np.log1p(y_test)y_pred_log = np.log1p(y_pred)# Plotfig = go.Figure([ go.Scatter( x=y_test_log, y=y_pred_log, mode='markers', marker=dict(color='skyblue', opacity=0.6), name='Predicted vs Actual (Log Scale)' ), go.Scatter( x=[min(y_test_log), max(y_test_log)], y=[min(y_test_log), max(y_test_log)], mode='lines', line=dict(color='red', dash='dash'), name='Ideal Fit' )])fig.update_layout( autosize=True, height=400, title="Predicted vs Actual Salary (Log Scale)", xaxis_title="Actual Log(Salary)", yaxis_title="Predicted Log(Salary)", margin=dict(l=20, r=20, t=50, b=20))fig.write_html("figures/analytics_plot2.html", full_html=False, include_plotlyjs="cdn", config={"responsive": True})fig.show()
This plot shows the Actual vs. Predicted Salary using a multiple linear regression model. The blue dots represent individual predictions, and the red dashed line is the ideal line where predicted = actual. Since most points lie very close to the red line, it means your model predicts salary very accurately, with minimal error and strong linear fit — likely reflected in a high R² score near 1.0.